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Description 

Tlie present invention relates to a storage control 
system for use in a computer system comprising a main 
memory, a processor and a caclie interposed between 
main memory and the processor. 

In high performance computers, caches serve to re- 
duce the observed latency to memory. The cache pro- 
vides a relatively small but very high performance mem- 
ory very close to the processor Data from the much larg- 
er but slower main memory is automatically staged into 
the cache by special hardware on a demand basis, typ- 
ically in units of transfer called "lines" (ranging, for ex- 
ample, from 32 to 256 bytes). If the program running on 
the computer exhibits good locality of reference, most of 
the accesses by the processor are satisfied from the 
cache, and the average memory access time seen by 
the processor will be very close to that of the cache; e.g.. 
on the order of one to two cycles. Only when the proc- 
essor does not find the required data in cache does it 
incur the "cache miss penalty", which is the longer laten- 
cy to the main memory; e.g., on the order of twenty to 
forty cycles in computers with short cycle times. For a 
given cache structure, a program can be characterized 
by its "cache hit ratio" (CHR) which is the fraction of the 
accesses that are satisfied from the cache and hence do 
not suffer the longer latency to main memory. 

Given the size of the cache, the structure of the 
cache has to be decided in terms of line size (in bytes), 
the number of lines, and the set associativity. Numerous 
design trade-off considerations go into these decisions. 
For example, the line size is chosen so that it is suffi- 
ciently large since most references are sequential and 
make efficient use of prefetched data. If the line size is 
small, it results in more line misses, and hence more 
miss penalty, for the same amount of data that currently 
defines the program locality. Further, smaller lines result 
in more lines in the cache and have cost, complexity and 
performance implications in the cache directory design. 

The line size is chosen so that it is not too large, 
since that nnay result in too few lines and hence would 
restrict over how many disjoint regions the locality of ref- 
erence may be distributed. Further, if the line size is 
large, each line miss will bring in a large number of data 
elements, all of which may not be used during the line's 
residency in the cache. This results in time and available 
main memory bandwidth being spent unnecessarily for 
data that will not be referenced. 

The set associativity of the cache is selected to re- 
duce the probability of cache line thrashing situations. 
Line thrashing occurs when the current locality of refer- 
ence includes more lines from a congruence class that 
map into the same set than the level of associativity pro- 
vided. This results in the lines constantly displacing each 
other from the cache and thus driving down the CHR. 
The set associativity, on the other hand, cannot be arbi- 
trarily large since it has a bearing on the cost and com- 
plexity of the cache look-up mechanism. 



References by an instruction may exhibit poor cache 
hit ratio for several reasons. For example, the instruction 
is in a loop and is references the elements of a data struc- 
ture with a non-unit stride. Classic examples are refer- 

5 ences to elements along various directions of a multi-di- 
mensional matrix and referencing a single column in a 
table of data. If the line size is L elements and the stride 
s is greater than L, each line will fetch L elements, only 
one of which will be utilized in the immediate future. If 

10 the size of the data structure is large and/or there are 
several other data structures being accessed by other 
instructions in the loop, these references will tend to flush 
the cache so that when another element in the same line 
is referenced. It will already have been displaced. This 

IS leads to situations where the cache hit ratio degrades to 
close to zero and the latency approaches main store ac- 
cess time, resulting in poor performance. Perfonmance 
is degraded further because of the fact that for each el- 
ement utilized, the cache mechanism fetches L-1 addi- 

20 tional elements that are never referenced while in the 
cache. This incurs the delay for the additional fetches as 
well as the deprivation of the available main memory 
bandwidth from the other processors in the system. 
Moreover, the increased cache coherence traffic can 

25 cause further degradation in all processors in the sys- 
tem. 

Another situation which causes poor cache hit ratio 
is where the instructions in a loop reference several data 
objects or several areas of the same data object that all 

30 fall in the same congruence class. This can occur more 
often than one may anticipate if the dimensions of the 
data objects are a power of two. One can expect to see 
more and more of that since in a parallel processing sys- 
tem, the available processors are typically a power of 

35 two. The natural tendency, then, is to have data objects 
whose dimensions are also a power of two so as to make 
it easy to partition them across the processors. 

Additionally, striding through large data objects in a 
non-unit stride direction causes not only the particular 

40 instruction to experience poor hit ratio, but it can also 
cause the code surrounding those instructions to suffer. 
This is because the instructbns with bad locality may 
have flushed the cache of useful data. 

It is therefore a desired aim of the present invention 

45 to provide an automatic bypass for inst ructions wh Ich ex- 
hibit poor cache hit ratio, thereby avoiding caching of 
such data with a consequent improvement in perform- 
ance. 

Accordingly, the present invention provides a stor- 
50 age control system for use in a computer system includ- 
ing a main memory, a processor tor requesting data from 
the main memory and a cache interposed between the 
processor and the memory for storing a subset of the 
data in the main memory, the control system comprising: 

55 

cache control means for determining if the data 
requested by the processor is in the cache, signify- 
ing a cache hit, and if so, retrieving the requested 
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data from the cache, but otherwise retrieving the 
requested data from rnain memory; 

table means addressable by an instruction address 
from the processor for keeping a record of status 
associated with an instruction signifying whether the 
data requested by that Instruction is cacheable or 

non-cacheable; 

means responsive to the cache control means and 
the table means for storing data retrieved from main 
memory into said cache when the current status 
associated with the requesting instruction is cache- 
able and bypassing the cache when the current sta- 
tus Is noncacheable; and such a system is already 
described in EP-A-0 412 247, 

The present invention is characterized by the fea- 
tures as claimed. 

Thus, there is provided an heuristic mechanism that 
avoids the caching of data for instructions whose data 
references, for whatever reason, exhibit low cache hit ra- 
tio. This is done automatically, without any intervention 
from the programmer or compiler The mechanism keeps 
track of the data reference locality of each instruction to 
decide if it should be made cacheable or non-caclieable. 
By keeping references made by an Instruction exhibiting 
bad locality out of the cache, the processor does not in- 
cur the performance penalty of fetching unnecessary 
data in the whole line. This, in turn, avoids the inefficient 
utilization of memory bandwidth as a result of fetching 
useless data and flushing useful data in the cache. In 
parallel programming environments, situations that 
cause line thrashing between multiple processors are re- 
duced by not caching the data for poorly behaved in- 
structions. 

A control system according to the invention prefer- 
ably adjusts itself over time, so that some of the refer- 
ences of an instruction will be cached while others will 
not. This results in keeping as much of the data object 
In the cache as possible without flushing useful data. 
Thus, for example, if the instruction is making passes 
along a non-unit direction of a matrix of data that is much 
larger than the cache, the scheme will tend to stabilize 
as much of the matrix in cache as there is available space 
while keeping the rest out of the cache by not caching 
references to It. In this way it adjusts to make the best 
use of the available cache space. 

A preferred embodiment of the invention will now be 
described with reference to the accompanying drawings 
in which: 

Figure 1 is a block diagram showing the organization 
of a memory hierarchy including a cache memory; 

Figure 2 is a block diagram showing a reference his- 
tory table structure according to the present inven- 
tion; 



Figure 3 is a state transition diagram illustrating the 
operation of the Invention; 

Figure 4 is a state table showing the states repre- 
s sented by the state transition diagram of Figure 3; 
and 

Figure 5 is a block and logic diagram showing the 
bask: implementation of the present invention. 

10 

Referring now to the drawings, and more particularly 
to Figure 1 , there is shown a computer memory hierarchy 
which includes a cache memory. A central processing 
unit (CPU) 1 0 processes data stored in main memory 1 2 

IS according to instructions, also stored In main memory 1 2. 
The cache memory 14 is interposed between the CPU 
10 and the main memory 1 2 and is a taster (and typically 
more expensive) memory than main memory 12 and 
stores a subset of the data and instructions stored in 

20 main memory 1 2. The concept of using a cache memory 
is based on the anticipation that data stored in cache 
memory 14 is likely to be reused and, since the time for 
the CPU 10 to access the cache memory is shorter than 
the time for the CPU to access the main memory 1 2, 

25 there is a consequent increase in performance in the 
computer system. The cache memory 14, or simply 
"cache", generally comprises an array of high-speed 
memory devices 15 and a tag directory 16. 

When the CPU 10 requests a new word, whether it 

30 be data or an instruction, a check is first made in an ad- 
dress tag directory 16 to determine It the word is in the 
cache memory array 15. If so (i.e., a cache "hit"), the 
word is read from the cache 14 directly to the CPU 10. 
If not (i.e., a cache "miss"), the word must be accessed 

35 from the main memory 12. Ordinarily, a word read from 
main memory 1 2 is written into the cache 1 4 anticipating 
that is will be referenced again in the near future. Actu- 
ally, it is customary to read a block of data containing the 
word actually referenced rather than just the word itself. 

40 This bkxk of data is then written into the cache 1 4. When 
a bkKk of data is written into the cache 14, data already 
in the cache 1 4 will be ovenvritten or "flushed". It is there- 
fore necessary to have some type of algorithm based on 
history of use to identify the least necessary block of data 

46 for overwriting. One such algorithm is the Least Recently 
Used (LRU) algorithm. 

According to the Invention, an instruction may be in 
"cacheable" or "non-cacheable" state based on its past 
behavior. If an instruction generally gets cache hits, it is 

so classified as currently cacheable. and any misses it ex- 
periences will result in the whole line being fetched into 
the cache, on the expectation that the instruction will con- 
tinue to behave well and will reference the line again. If 
an instruction generally gets cache misses, it Is classified 

55 as currently non-cacheable, and the misses it experienc- 
es will result in only the required data element being 
fetched directly from the main memory 12. The line Is not 
fetched into the cache 14 on the expectation that the in- 
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struction will continue to behave badly and is unlikely to 
reference the line again in the near future. 

To enhance adaptability to varying situations, the 
concept of 'bonus for good behavior" is introduced. Its 
function is to provide a threshold for deciding when to 
switch from cacheable to non-cacheable. Thus, an in- 
struction that experiences cache hits is given a "bonus" 

01 some fixed or, in an adaptive variation, variable value 
B. If an instruction is cacheable and currently has a bo- 
nus of "b", up to b consecutive line misses will be toler- 
ated for its past "good behavior"; that is, it will remain in 
cacheable state for the next b consecutive misses. If 
there are more than b consecutive misses, the instruc- 
tion will move into the non-cacheable state on the (b+1 ) 
th miss. 

The fixed bonus parameter B could be a system se- 
lected value, or it may be a tuning parameter that can be 
specified for each job or made to vary within a job. The 
effect of the bonus parameter B is discussed below. 

A reference history table (RHT) 20, as shown in Fig- 
ure 2, is provided In the CPU 10 according to the inven- 
tion. The RHT 20 Is addressed by an instruction address 
in register 22 in the CPU. The structure shown in Figure 

2 Is specific to the IBM System/370-XA (extended archi- 
tecture) processors insofaras as the number of bits (32) 
of the instruction address that is used to address the RHT 
20. It will be understood by those skilled in the art that 
the structure can be trivially adjusted for different proc- 
essor architectures. For more information on the IBM 
System/370 (S/370), reference may be had to IBM Sys- 
tem/370, Principles of Operation, IBM Publication No. 
G/\22-7000-l0 (1987). 

To stabilize large instruction loops, the RHT 20 
needs to be only about 1 K to 2K elements long. The RHT 
20 is directly mapped using the instruction address in 
register 22. Keeping in mind that S/370 Instructions can 
only lie on haltword boundaries, for a 2K entry RHT, bits 
20-30 of the instruction address are used to Index into 
the RHT 20. Assuming that there is roughly an equal mix 
of 2-byte and 4-byte instructions, a 2K entry RHT will sta- 
bilize loops of around 1300 instructions. 

Each entry of the RHT 20 consists of three fields; 
the last referenced line address (LRLA) field 24, the 
STATE field 26 and the BONUS field 28. 

The Last Referenced Line Address (LRLA) field 24 
contains the address of the last line referenced by the 
instruction. For S/370-XA and cache line size of 128 
bytes, the LRLA field can be a maximum of 24 bits long. 
However, the scheme will work as well if only the least 
significant six to ten bits of the line address are saved. 
This is enough to reduce the probability of two consec- 
utively referenced lines having the same six to ten least 
significant address bits to near zero. Further, a false 
match because of comparing only a few bits merely re- 
sults in the caching of a few additional lines that normally 
would have been bypassed. 

The State field 26 contains the current state of the 
instruction. The state provides information about how the 



referenced data is to be treated, cached or non-cached; 
i.e., bring the whole line into cache or bypass the cache 
and access the referenced data element directly from 
main memory. The in5tructk>n's past behavior is mapped 
s into one of two basic states. The State field 26 is one bit 
long. 

The Bonus field 28 contains the current value of the 
bonus associated with an instruction's past "good behav- 
ior" (line hits). In the RHT diagram, the bonus field is 

10 shown to he n hits, where n can be from 0 to log2(S) bits, 
where S is the total number of lines in the cache. The 
state diagram shown in Figure 3 explains how the value 
of the bonus is manipulated. 

An instruction's past behavior is mapped into one of 

IS two primary states, cacheable or non-cacheable. State 
"0" is for instructions that have exhibited good reference 
locality. Data for instructions in state "0" is cached. An 
instruction in state "0" can have a current bonus value 
from 0 to B, inclusive. State "1" is for instructions that 

20 have exhibited poor reference locality. Data for instruc- 
tions in state "1 " is not cached. Only the referenced data 
element is directly fetched from main memory, bypassing 
the cache completely. Instructions in state "1 " have a bo- 
nus value of zero at all times. A variation of the bypass 

25 scheme, discussed below, introduces a concept of a 
fixed "penalty" associated with instructions in state "1 ". 

The two states are explained in more detail below 
with reference to Figures 2 and 3. In the cacheable state 
30 (S=0). an instruction has been behaving well. Its pre- 

30 vious references have been hits to lines in cache 1 4. 
Every time an instruction in the cacheable state 30 gets 
a line hit, its bonus value is reset to B. If an instruction in 
the cacheable state 32 gets a line miss and the current 
value of the bonus, b. is greater than zero, the instruction 

35 remains in the cacheable state but with the bonus re- 
duced by one to b-1 . If an instruction in the cacheable 
state 30 gets a line miss and the current value of the bo- 
nus, b, is equal to zero, implying that the instruction has 
experienced B consecutive line misses, the instruction 

40 will move into the non-cacheable state 32 (S=1), with a 
bonus of zero. These rules are summarized by the first 
two rows of the table in Figure 4, where "C" means 
cacheable and "NC means non-cacheable. 

An instructbn in the non-cacheable state 32 has ex- 

45 hibited poor data reference locality in the immediate 
past. Its last 8 or more references have been to different 
lines, and they were not found in cache. As long as it 
does not reference the same line twice In succession or 
it does not reference a line already In cache, the instruc- 

50 tion will remain in state 32 with a bonus of zero. Should 
its next reference be to the previously referenced line, 
whose address is in the LRLA field 24 but will not be a 
real hit since the line was not fetched into cache 14 dur- 
ing the last reference, it will return to the cacheable state 

ss (S=:0) with a bonus of zero. This situation is referred to 
as a "pseudo-hit". Shoukl its next reference be to another 
line that is already in cache, it moves to the cacheable 
state 30 (S=0) with a bonus of B. These rules are sum- 
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marized in the last row of the table shown in Figure 4. 

Figure 5 shows an implementation of the automatic 
cache bypass mechanism according to the invention. In 
Figure 5. LRLA field 24 of the current RHT entry 51 is 
supplied as one input to comparator 52, the other input 
to which is the current line address. A match generates 
a logica! M " at the output of comparator 52. The BONUS 
field 28 of the current RHT entry 51 is supplied as one 
Input to comparator 53 which determines whether this 
field Is currently greater than zero. If so, the output of 
comparator 53 is a logical "1". The outputs of the two 
comparators 52 and 53 are supplied as inputs to NOR 
gate 54 which also receives as inputs the STATE field 26 
and the "hit" output of the cache mechanism (not shown). 
The normal cache directory look-up takes place in par- 
allel with the RHT lookup and sends a control signal to 
signal a cache hit or miss. 

As is well known, the output of a NOR gate is a log- 
ical "1 "only when all its inputs are logical "Os". Therefore, 
a logical "1 " from any of the comparators 52 and 53, the 
state field 26 or a cache hit will result in the output of the 
NOR gate 54 being a logical "0". The output of the NOR 
gate 54 is supplied back to the cache mechanism. A log- 
ical "0" from the NOR gate 54 causes the cache mech- 
anism to cache the next line, but a logical "1 * causes the 
cache mechanism to bypass the cache. Thus, the auto- 
matic cache bypass mechanism according to the inven- 
tion sends a signal back to the cache mechanism to gov- 
ern whether the current line is to be cacheable or 
non-cacheable. 

The output of the NOR gate 54 is also supplied to 
the STATE field 26' of the new RHT entry 55. The LRLA 
field 24' of the new RHT entry 55 is the current line ad- 
dress. To generate the BONUS field 28' of the new RHT 
entry 55, the BONUS field 28 of the current RHT entry 
51 is supplied to selection circuit 56 which selects either 
the value of zero or the value (b-1 ), whichever is greater. 
The result of this selection is supplied to a multiplexer 57 
as one input, the other input to which is the value B. The 
select input or multiplexer control is the cache hit line. If 
there is a cache hit (a logical "1 ') the value B is selected, 
but if there is a cache miss (a logical "0") the output of 
the selection circuit 56 is selected. 

The RHT array design is detemiined by the rate of 
operand address generation. Typically, this rate is one 
address per cycle. Since each instruction needs a RHT 
read and write cycle, the RHT should either be imple- 
mented of an array that allows simultaneous read/write, 
or it should be implemented with two or more interleaves. 

As long as an instruction makes two or more con- 
secutive references to the same line, the line will always 
be fetched into the cache, and the cache performance 
with and without the bypass will be the same. If the cur- 
rent locality of reference within an array is larger than the 
cache (or the current cache space available to the array 
in the presence of several competing instructions), the 
normal cache mechanism will result in zero cache hit for 
that array, with all the accompanying performance de- 



graders; namely, time spent to fetch the rest of the line, 
memory bandwidth usurped from other processors, 
flushing the cache of useful data, and line thrashing sit- 
uations. 

s The RHT scheme, on the other hand, would migrate 
as much of the array into cache as possible, depending 
on the locality of other instructions, and keep the rest of 
it out so as to avoid the continuous flushing of the cache. 
This increases the hit ratio on the part of the array that 

10 stabilizes in cache. On the part of the array not in cache, 
the scheme avoids the additional penalty of fetching the 
unnecessary words in the line, increases the hit raXto on 
the rest of the data by not flushing the cache, reduces 
the bandwidth demand on the memory by fetching only 

IS the required word, and avoids line thrashing situations 
in parallel processing environments. This results in bet- 
ter overall cache hit ratio seen by that instruction than 
with normal cache mechanism. All this helps to improve 
the overall system performance. 

20 The Bonus field was introduced as a means of re- 
warding "good behavior". An instruction that has shown 
recent good behavior, in the form of cache hits, will be 
"forgiven" some "bad behavior" (i.e., cache misses) in 
the future. The value of B will determine how quickly a 

2S portion of the data object is stabilized in cache when the 
referencing characteristrcs are bad. How much of the 
data object Is stabilized in cache will also be dependent 
on the referencing characteristics of the other instruc- 
tions. 

30 Consider first the case where 8=1. As long as an 
instruction is exhibiting bad locality, the RHT scheme 
does well in keeping the array out of the cache. However, 
with B=:1, the migration of a part of the array into the 
cache can be very slow, at the rate of one line per row 

3S access. Thus, optimal use of the available cache space 
would not be made. 

To speed up the rate at which a portion of the matrix 
that is being referenced with poor locality can be staged 
into the cache, a value of B>1 should be used. Selection 

40 of the optimal value of B is a trade off between the speed 
with which array sections are stabilized in cache, in spite 
of poor referencing characteristics, versus increased 
cache-to-memory traffic. If B is very large, say B=S 
where S is the cache size in lines, the RHT mechanism 

46 quickly stages in and stabilizes portions of arrays in the 
cache, however, when the code Is working with several 
arrays with poor referencing pattems, a large value of B 
can over commit the available cache space. This can re- 
sult in some non-optimal decisions being made about 

so cacheability and non-cacheability, resulting in slightly in- 
creased cache-to-memory traffic than would have oc- 
curred with a smaller value of B. On the other hand, if 
the value of B is too small, the migration and stabilization 
of portions of the arrays can be slow, resulting in slightly 

55 reduced cache hit ratio than if the value of B had been 
higher. Note however, that in either situation, the effec- 
tive cache performance with the bypass mechanism, in 
terms of cache hit ratio, memory bandwidth and memory 
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access time» is better than without it. 

Simulation runs showthat, in general, when working 
with several arrays being referenced with poor locality, a 
value of B around S/4 results in best average overall per- 
formance, where S is the size of cache in lines. 

The specific implementation shown In Figure 5 can 
be modified to better optimize the RHT scheme. For ex- 
ample, rather than having a fixed bonus value B, it can 
be dynamically selected based on overall cache-miss 
activity. If the cache miss activity can be monitored, the 
value of B can be selected based on how many instruc- 
tions are currently experiencing bad reference charac- 
teristics. The value of B can be increased when there are 
few such instructions and reduced when there are many. 
This allows a more dynamic trade-off between the speed 
with which portions of data objects currently exhibiting 
bad reference characteristics are stabilized in cache and 
the overall cache-memory traffic. 

In addition, in the basic RHT scheme, once an in- 
struction enters the non-cacheable state, it remains in 
that state as long as it continues to experience repeated 
cache misses. A variation on that uses the concept of a 
limited penalty time. Once an instruction is marked as 
non-cacheable, it remains so for at most "M" subsequent 
executions. After that, it automatically moves to state 
C,m. M and m are additional tuning parameters that allow 
us to fine-tune the performance of the cache. 



Claims 

1 . A Storage control system for use in a computer sys- 
tem including a main memory (1 2), a processor (10) 
for requesting data from the main memory and a 
cache (14) Interposed between the processor and 
the memory for storing a subset of the data in the 
main memory, the control system comprising: 

cache control means (16) for determining if the 
data requested by the processor is in the cache, 
signifying a cache hit. and if so, retrieving the 
requested data from the cache, but otherwise 
retrieving the requested data from main mem- 
ory; 

table means (20) addressable by an instruction 
address from the processor for keeping a record 
of status (26) associated with an instruction sig- 
nifying whether the data requested by that 
Instructkjn is cacheable or non-cacheable; 

means responsive to the cache control means 
and the table means for storing data retrieved 
from main memory into said cache when the 
current status associated with the requesting 
instruction is cacheable and bypassing the 
cache when the current status Is noncacheable; 
characterized in that 



the table means (20) further keeps a bonus 
value (2B) also associated with an instruction; 
and that 

5 the control system further comprises means (52 

- 55) for changing the status of the instruction 
as recorded In the table means as a function of 
cache bits and the bonus value. 

10 2. A control system as claimed in claim 1 further com- 
prising: 

means for assigning an initial bonus value to an 
instruction with a status of cacheable when first 
IS storing data in said cache; and 

means for modifying said bonus value as a func- 
tion of cache hits or cache misses for the 
instruction. 

20 

3. A control system as claimed In claim 2 wherein the 
means for modifying the bonus value reduces the 
recorded bonus value for each cache miss and the 
means for changing the status of an instruction 

25 changes a status ot cacheable. as recorded in the 
table means, to noncacheable on a cache miss 
when the bonus value is zero. 

4. A control system as claimed in claim 3 wherein the 
30 means for changing the status of an instruction 

changes a status of noncacheable, as recorded In 
said table means, to cacheable on a cache hit, said 
bonus value remaining zero. 

35 5. A control system as claimed in any preceding claim 
wherein the bonus value is a tuning parameter which 
is determined prior to a running a program on the 
computer system. 

40 6. A control system as claimed in any preceding claim 
wherein the bonus value is atuning parameter which 
is predefined within a program running on the com- 
puter system. 

45 7. A control system as claimed in any preceding claim 
wherein the cache comprises S lines and the bowls 

value is around S/4. 

8. A control system as claimed in any preceding claim 
so wherein the bonus value is atuning parameter which 

is dynamically varied as a function of cache-miss 
activity. 

9. A control system as claimed in any preceding claim 
ss wherein the means responsive to the cache control 

means and the table means for storing data in the 
cache comprises: 
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means for reading out of the table means the 
status and bonus value for a line addressed by 
an instruction address from the processor; 

comparator means for determining whether the 
bonus value is greater than zero; and 

gating means responsive to the status, the com- 
parator means and the cache control means for 
generating a bypass signal to the cache control 
means. 

10. A control system as claimed in claim 9 further com- 
prising: 

selector means responsive to the bonus value 
read out of the table means for selecting the 
greater of zero or one less than the bonus value; 

multiplexer means having as inputs an output of 
the selector means and a predefined bonus 
value, the multiplexer means providing as an 
output the predefined bonus value when the 
cache control means indicates a cache hit but 
the output of the selector means when the 
cache control means indicates a cache miss; 
and 

means for updating the bonus value, as 
recorded in the table means, with the output of 
the multiplexer means. 

11. A control system as claimed in claim 10 wherein the 
means for changing the status of data, as recorded 
in the table means, of a line addressed by an instruc- 
tion address from the processor uses the bypass 
signal from the gating means to update the status. 

12. A computer system including a main memory; 

a processor for requesting data from the main 
memory; 

a cache interposed between the processor and 
the memory for storing a subset of data in the 
main memory; and 

a storage control system as claimed In any pre- 
ceding claim. 



Patentanspruche 

1 . Eine Speichersteuerungsanordnung fur den Einsatz 
in einem Computersystem, das einen Hauptspei- 
cher (12), einen Prozessor (10) zum Anfordem von 
Daten aus dem Hauptspeicher und einen zwischen 
Prozessor und Speicher zwischengeschalteten 
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Cache (14) zum Speichern einer Teilmenge der 
Daten im Hauptspeicher aufweist, wobei die Steue- 
rungsanordnung umfa3t: 

ein Cache-Steuerungsmittel(16) zum Ermittein, 
ob sich die vom Prozessor angeforderten Daten 
im Cache befinden, so da3 ein Cache-Hit vor- 
liegt, und wenn ja, Abrufen der angeforderten 
Daten aus dem Cache, anderenfalls dagegen 
Abrufen der angeforderten Daten aus dem 
Hauptspeicher; 

ein Tabeilenmittel (20), das mit einer Anwei- 
sungsadresse aus dem Prozessor adressierbar 
ist. zum Fuhren einer Aufzeichnung des zu 
einer Anweisung gehorenden Status (26), der 
anzeigt. ob die von der Anweisung angeforder- 
ten Daten cachefahig sind oder nicht; 

ein Mittel. das auf das Cache-Steuerungsmittel 
und das Tabeilenmittel reagiert, zum Speichern 
der aus dem Hauptspeicher abgerutenen Daten 
in dem Cache, wenn deraktuelle Status, derzu 
der anfordernden Anweisung gehort, "cachefa- 
hig" lautet, und Umgehen des Caches, wenn 
der aktuelle Status "nicht cachefahig" lautet; 

dadurch gekennzeichnet. daB 

das Tabeilenmittel (20) femer einen Bonuswert 
(28) fOhrt, der ebenfalls zu einer Anweisung 
gehort, und daG die Steuerungsanordnung fer- 
ner Mittel (52-55) umfa3t. urn den Status der 
Anweisung, der in dem Tabeilenmittel aufge- 
zeichnet Ist, in Abhangigkeit von den 
Cache-Hits und dem Bonuswert zu andern. 



Eine Steuerungsanordnung nach Anspruch 1. 
femer umfa3t: 



die 



50 3. 



55 



ein Mittel zum Zuweisen eines Anfangsbonus- 
wertes zu einer Anweisung mit dem Status 
"cachefahig", wenn in dem Cache erstmals 
Daten gespeichert warden; und 

ein Mittel zum Modifizieren des Bonuswertes in 
Abhangigkeit von Cache-Hits oder Cache-Mis- 
ses fOr die Anweisung. 

Eine Steuerungsanordnung nach Anspruch 2, bei 
der das Mittel zum Modifizieren des Bonuswertes 
den aufgezeichneten Bonuswert fur jeden 
Cache-Miss reduziert und das Mittel zum Andem 
des Status einer Anweisung den in dem Tabeilen- 
mittel gespeicherten Status 'cachefahig" bei einem 
Cache-Miss in "nicht cachefahig" umwandelt, wenn 
der Bonuswert null ist. 
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4. Eine Steuerungsanordnung nach Anspruch 3, bei 
der das Mittel zum Andem des Status einer Anwei- 
sung den in dem Tabellenmittel gespeicherten Sta- 
tus "nicht cachefahig' bei einem Cache-Hit In 
"cachefahig" andert, wobei der Bonuswert null 
bleibt. 

5. Eine Steuerungsanordnung nach einem der voran- 
gegangenen Anspruche, bei der der Bonuswert ein 
Abstimmungsparameter ist, der emnittelt wird, bevor 
etn Programm auf dem Computersystem lauft. 

6. Eine Steuerungsanordnung nach einem der voran- 
gegangenen AnsprOche, bei der der Bonuswert ein 
Abstimmungsparameter ist, der innerhalb eines 
Programms vorbestimmt ist, das auf dem Compu- 
tersystem lauft. 

7. Eine Steuerungsanordnung nach einem der voran- 
gegangenen Anspruche, bei der der Cache S Zeilen 
umfaQt und der Bonuswert bei etwa S/4 liegt. 

8. Eine Steuerungsanordnung nach einem der voran- 
gegangenen Anspruche, bei der der Bonuswert ein 
Abstimmungsparameter ist, der in Abhangigl^eit von 
der Cache-Miss-Aktivitat dynamisch verandert wird. 

9. Eine Steuerungsanordnung nach einem der voran- 
gegangenen AnsprOche, bei der das auf das 
Cache-Steuerungsmittel und das Tabellenmittel 
reagierende IV4ittel zum Speichem von Daten im 
Cache umfaf5t: 

ein Mittel zum Auslesen des Status und des 
Bonuswertes fur eine Zeile, die mit einer Anwei- 
sungsadresse aus dem Prozessor adressiert 
wurde, aus dem Tabellenmittel; 

ein Komparatormittel zum Ermittein, ob der 
Bonuswert groOer als null ist; und 

ein Gattersteuerungsmittel, das auf den Status, 
das Komparatormittel und das Cache-Steue- 
rungsmittel reagiert, zum Erzeugen eines 
Umgehungssignals an das Cache-Steuerungs- 
mittel. 

10. Eine Steuerungsanordnung nach Anspruch 9, die 
ferner umfaBt: 

ein Auswahlmittel, das auf den aus dem Tabel- 
lenmittel ausgelesenen Bonuswert reagiert, 
zum Auswahlen des Wertes null oder, wenn die- 
sergroBer ist, des Wertes, der um eins geringer 
ist als der Bonuswert; 

ein Multiplexermtttel, das als Eingabe eine Aus- 
gabe des Auswahlmrttels und einen vorbe- 



stimmten Bonuswert hat, wobei das Multiple- 
xermittel als Ausgabe den vorbestimmten 
Bonuswert liefert, wenn das Cache-Steue- 
rungsmittel einen Cache-Hit anzeigt, jedoch die 
s Ausgabe des Auswahlmittels liefert, wenn das 

Cache-Steuerungsmittel einen Cache-Miss 
anzeigt; und 

ein Mittel zum Aktualisieren des in dem Tabel- 
10 lenmittel aufgezeichneten Bonuswertes mit der 

Ausgabe des Multiplexermittels. 

11. Eine Steuerungsanordnung nach Anspruch 10, bei 
der das Mittel zum Andem des in dem Tabellenmittel 

75 aufgezeichneten Status der Daten einer Zeile, die 
mit einer Anweisungsadresse aus dem Prozessor 
adressiert wird, mit Hitfe des Umgehungssignals 
aus dem Gattersteuerungsmittel den Status aktua- 
lisiert. 

20 

12. Ein Computersystem mit einem Hauptspeicher; 

einem Prozessor zum Anfordern von Daten aus 
dem Hauptspeicher; 

25 

einem zwischen Prozessor und Speicher zwl- 
schengeschalteten Cache zum Speichem einer 
Teilmenge von Daten im Hauptspeicher; und 

30 einer Speichersteuerungsanordnung nach 

einem der vorangegangenen Anspruche. 



Revendications 

35 

1. Systeme de commande de stocloge destine a §tre 
utilise dans un syst6med'ordinateurcomportant une 
m6moire principale (12), un processeur (10) pour 
demander des donn6es ^ partir de la m6moire prln- 
40 crpale et une ant6-m6moire (14) interposee entre le 
processeur et la m6moire pour stocker une 
sous-sdrie des donn^es dans la mdmoire principale, 
le systdme de commande comprenant: 

45 un moyen (16) de commande d'ant6-m6moire 

pour d6temriiner si les donnees demandees par 
le processeur se trouvent dans I'ante-memoire, 
signifiant une correspondance de 
rant6-m6moire. et si oui, recherche des don- 

50 n6es demand6es ^ partir de rant6-m6moire, 

mais sinon, recherche des donn6es deman- 
dees d partir de la memoire principale; 
un moyen (20) formant table adressable par une 
adressed'instruction en provenance du proces- 

55 seur pour garder un enregistrement de I'dtat 

(26) associd d une instructk)n signifiant si les 
donndes demandees par cette instruction peu- 
vent ou non dtre mises en ant6-mdmoire; 
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un moyen sensible au moyen de commande 
d'ant^-m^moire et au moyen formant table pour 
stocker les donn^es recherch6es d partir de la 
m6moire principale dans ladite ant^-m^moire 
lorsque t'^tat en cours associe ^ I'instruction de 
demande peut 3tre mis en ant6-mdmoire et en 
court-circuitant I'antd-m^moire lorsque i'6tat en 
cours ne peut pas §tre mis en ant6-m6molre; 
caract6ris6 en ce que 

le moyen formant table (20) garde encore une 
valeurde bonus (28) assocl6e aussi ^ une ins- 
truction et en ce que le systems de commande 
comprend en outre un moyen (S2-55) pour 
nfKXiifier I'dtat de Tinstruction telle qu'enregis- 
tr6e dans le moyen formant table en fonction 
des correspondances de Tant^-mdmotre et de 
la valeur de bonus. 

2. Systems de commande selon la revendication 1, 
comprenant en outre: 

un moyen pour assigner une valeur de bonus 
initiale ^ une instruction avec un etat de possi- 
bility de mise en antd-m^moire lors du premier 
stockage de donn^es dans ladite 
ant6-m6moire; et 

un moyen pour modifier ladite valeur de bonus 
en fonction des correspondances de 
rant6-m6nrK>ire ou des acc^s manquds de 
rant6-m6moire pour I'instruction. 

3. Syst6me de commande selon la revendication 2, 
dans lequel le moyen pour modifier la valeur de 
bonus reduit la valeur de bonus enregistree pour 
chaque acces manqu§ de Tante-mdmoire et le 
moyen pour modifier I'etat d'une instruction modifie 
I'dtat d'une possibility de mise en anty-m^moire. tel 
qu'enregistr6 dans le moyen formant table, en non 
possibility de mise en anty-m^moire sur un acc^s 
manquy ^ I'anty-mymoire lorsque la valeur de 
bonus est de zyro. 

4. Systyme de commande selon la revendication 3, 
dans lequel le moyen pour modifier I'ytat d'une Ins- 
truction modifie I'ytat de non possibility de mise en 
ante-memoire, tel qu'enregistre dans ledit moyen 
formant table, en une possibility de mise en 
ante-memoire sur un acces manque ci 
I'anty-mymoire, ladite valeur de bonus restant ^ 
zyro. 

5. Systyme de commande selon Tune quelconque des 
revendications prycydentes, dans lequel la valeur 
de bonus est un pararndtre d'accord qui est dyter- 
miny avant da faire tourner un programme sur le 
systyme d'ordinateur. 

6. Systyme de commande selon I'une quelconque des 



revendications prycydentes. dans lequel la valeur 
de bonus est un paramytre d'accord qui est prydyfini 
d I'lntyrieur d'un programme tournant sur le systdme 
d'ordinateur. 

5 

7. Systyme de commande selon I'une quelconque des 
revendications prycydentes, dans lequel 
I'anty-mymoire comprend S lignes et la valeur de 
bonus est aux alentours de S/4. 

10 

8. Systyme de commande selon Tune quelconque des 
revendications precydentes, dans lequel la valeur 
de bonus est un paramytre d'accord qui varie de 
maniyre dynamique en fonction de Tactivity de 

IS I'accys manquy ^ I'anty-my moire. 

9. Systyme de commande selon I'une quelconque des 
revendications prycedentes. dans lequel le moyen 
sensible au moyen de commande d'anty-mymoire 

20 et le moyen formant table pour stocker des donnyes 
dans I'anty-mymoire comprennent; 

un moyen pour lire dans le moyen formant table 
I'ytat et la valeurde bonus pour une ligne adres- 
25 sye par une adresse d'instruction en prove- 

nance du processeur; 

un moyen comparateur pour dytermtner si la 
valeur de bonus est supyrieure d zyro; et 
un moyen de dyclenchement sensible k i'ytat, 
30 au moyen comparateur et au moyen de com- 

mande de I'anty-mymoire pour gynyrer un 
signal de court-circuit pour le moyen de com- 
mande de I'anty-mymoire. 

3S 10. Systyme de commande selon la revendication 9, 
comprenant en outre: 

un moyen syiecteur sensible ^ la valeur de 
bonus lus dans le moyen formant table pour 
^0 syiectionner le plus grand de zyro ou un moins 

la valeur de bonus; 

un moyen multiplexeur ayant en entryes une 
sortie du moyen syiecteur et une valeur de 
bonus prydyfinie, le moyen multiplexeur four- 
45 nissant en sortie la valeur de bonus prydyfinie 

lorsque le moyen de commande de 
I'anty-memoire indique un accys manquy k 
Tanty-memoire; et 

un moyen pour mettre k jour ta valeur de bonus, 
50 telle qu'enregistrye dans le moyen formant 

table, avec la sortie du moyen multiplexeur. 

11. Systeme de commande selon la revendication 10, 
dans lequel le moyen pour modifier I'etat des don- 
55 nyes. tel qu'enregistry dans le moyen formant table, 
d'une ligne adressye par une adresse d'instruction 
en provenance du processeur utilise le signal de 
court-circuit en provenance du moyen de dyclen- 
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chement pour mettre k jour I'dtat. 

12. Syst^me d'ordinateur comportant une mdmoire 

principale; 

5 

un processeur pour demander des donndes h 
partir de la m6moire principale; 
une ant6-m6molre interpos^e entre le proces- 
seur et la m6moire pour stocker une sous-s6rie 
des donnees dans la memoire principale; et 
un system e de commande de stockage tel que 
revendique dans Tune quelconque des revendi- 
cations prdc^dentes. 
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